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(57) Abstract: A method of hyperlinking images in a set of images by generating a structure tree for each image and matching 
attributes of an image to identify similar structure trees. The structure tree is generated by processing each image to identify nodes 
and compiling the nodes into the tree in the form of a searchable metadata file. The method generates direct hyperlinking between 
images without overlays. 
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HYPERIMAGE SYSTEM 

This invention relates to a method for processing images into a form 
5 suitable for hyperlinking. In particular, it relates to a process for generating 
hyperlinks between images in a set of images. The invention provides a hyper- 
image system that facilitates uninhibited navigation between images. 

BACKGROUND TO THE INVENTION 

1 0 The concept of hyperlinking is familiar to people who routinely access the 

World Wide Web (the Web). A hyperlink is an active redirection from one 
location to another location, usually related. Hyperlinking is not restricted to the 
Web but is often found in, for example, large text documents and electronic 
encyclopedias. Hyperlinking is intrinsic to the creation of multimedia works. 

15 Hyperlinking has been, in essence, restricted to hypertext linking. 

Although existing hypertext systems support images, they are normally only 
embedded in the body of the text in a non-interactive way. In some cases, 
particularly multimedia encyclopedias, images (static and video) are destinations 
that are not forward linkable. In some cases it is possible to link forward from an 

20 image by treating the image as an iconic button, but this does not provide 
hyperlinking from the image medium. This constraint is due to images being 
unstructured bit arrays that do not have internally referenceable components for 
the formation of hyperlinks. 

The familiar hypertext mark-up language (HTML) is a system for linking 

25 text in a document to another location. Systems such as HTML provide 
extensions that allow for linking to and from images. However, this is achieved 
by production of a linkmap overlay. The overlay is manually generated and 
specifies hot regions serving as anchor points. While providing the semblance 
of image linking, the linking is not intrinsic to the image. 

30 For the production of true hyperlinked multimedia systems, there is a 

need for a hyperimage system that provides hyperlinking of images in similar 
manner to existing hypertext systems. 
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It is also desirable for a hyperimage system to be able to process a set 
of images automatically. Automatic hypertext systems have been developed to 
address the heavy authoring load associated with producing a large multimedia 
document. Although existing systems are somewhat crude, generally requiring 

5 human intervention to achieve an aesthetically acceptable result, they 
nonetheless reduce the authoring overhead. In the case of images however, the 
highly abstract nature of the information conveyed intensifies the role of the 
authoring system, since semantic interpretation of the data is required in 
addition to the extra handling complexity due to its multifarious nature. While this 

10 means that the development of an automatic authoring process is much more 
difficult for hyperimages, it also means that the introduction of automatic image 
hyperlinking is much more advantageous. 

The process of authoring (or creating) hypermedia documents requires 
three distinct phases. These phases are node, link and presentation authoring. 

15 Node authoring involves preparing the raw or unstructured source information 
into the desired nodes. Each node typically encapsulates a single concept or 
semantic entity. Link authoring is the process of forming relationships between 
the nodes by creating links and anchors. Presentation authoring defines the 
rendition' of the data including such things as layout, font selection and 

20 formatting both within individual nodes and for the overall system. 

In the following description, a convention is adopted that an image can 
be described by a collection of objects, or nodes. Objects, or nodes, are a 
specific grouping of semiotic signs. Each sign is made up of sub-signs having 
no semantic value but which can be identified as homogeneous regions 

25 demarcated by either observed or implied luminance edges. 

OBJECT OF THE INVENTION 

It is an object of the invention to provide a method for processing an 
image into a hyperlinkable form. 
30 It is a further object of the invention to provide a process for generating 

hyperlinks between images in a set of images. 

Other objects will be evident from the following discussion. 
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SUMMARY OF THE INVENTION 

In one form, although it need not be the only, or indeed the broadest, form 
the invention resides in a process for generating hyperlinks between images in 
a set of images including the steps of: 
5 processing each image to identify nodes within the image; 
characterising each node in terms of a discrete set of attributes; 
compiling a searchable metadata file of all sets of attributes for all images; 
searching the metadata file to identify a node having attributes most closely 
related to attributes of a selected node; and 
10 forming a link between the identified node and the selected node. 

Suitably, the step of identifying nodes further includes the steps of: 
isolating elementary components in an image; 
local grouping of the components into signs; and 

global grouping of the signs into objects that collectively describe the image. 
1 5 The step of isolating elementary components may suitably be performed 

by a region growing technique in which pixels are added to a region according 

to a homogeneity criterion and edge information. 

The step of local grouping preferably includes the steps of segmenting 

the components and edge analysis. Gestalt grouping principles are preferably 
20 employed. 

The step of forming links between identified nodes and selected nodes 
may occur dynamically or statically. 



BRIEF DESCRIPTION OF THE DRAWINGS 
25 Preferred embodiments of the invention are described with reference to 

the following figures in which: 

FIG 1 is a schematic of a first embodiment of a hyper-image system; 
FIG 2 is a schematic of a second embodiment of a hyper-image system; 
FIG 3 shows the steps involved in object demarcation; 
30 FIG 4 shows an original set of images to be used for exemplifying the 

invention; 

FIG 5 shows the effect of segmentation of one of the images of FIG 4; 
FIG 6 shows the steps involved in third phase grouping; 
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FIG 7 shows the effect of segmentation and edge enhancement; and 
FIG 8 shows the effect of segmentation and grouping. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 
5 The components in the hyper-image system are partitioned into three 

main sections. The first section deals with object metadata file population, the 
second with link creation and the third deals mainly with user interface and 
navigation support. Typically the object metadata file population occurs in batch 
mode with the user supplying a large set of images to the system which 

10 processes them automatically by identifying objects and extracting their 
attributes and structure and storing these in the object metadata file. When 
complete, users are free to surf through images by simply clicking anywhere 
within the displayed image area. 

In a first operating mode, depicted in FIG 1, a set of images 1 are 

15 processed 2 to identify nodes (objects). The nodes are further analysed 3 to 
extract object attributes for populating the node metadata file 4. An associative 
linking process 5 is automatically performed according to the similarity measures 
described in a configuration file 6 that determines the relative weight to be 
accorded to each of the object's features such as shape, colour, structure etc. 

20 This linking stage is performed in batch mode, forming a static hyperweb for the 
image data set. After its completion authors are free to manually edit the links 
according to their needs. Hit testing and object matching is performed 7 to 
confirm the hyper-image linking for the user interface presentation 8. As 
mentioned above, the user 9 is then free to surf through the images 1 by clicking 

25 anywhere within the displayed image area. This process isolates the node 
demarcation from the presentation layer so that it is not dependent on the user 
interface. 

In a second operating mode, displayed in FIG 2, the images 1 are 
processed 2 and analysed 3 to populate a node metadata file 4, in the manner 
30 described above. Hit testing 7 is performed on the displayed image to determine 
what object the user selected and its properties are retrieved from the metadata 
file 4 to search against. Once the search criteria have been retrieved, a full 
search is made of the metadata file 4 to identify the image that contains the most 
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similar matching object. The image that contains that object is then loaded and 

displayed ready for the user to continue surfing. 

Navigation in the second embodiment is significantly slower than the first 

but affords the user the ability to reconfigure the attribute weightings for each 
5 search. In a further embodiment, right clicking on any image object will bring up 

a list of possible link destinations (image objects) ranked in order of similarity to 

the current object. This allows users more control over navigation by providing 

multiple target destinations. This mode also permits users to start navigation 

from an object in an image that is supplied by the user at run-time and hence 
10 has not been registered with the system. In this case the system first 

demarcates the boundaries of the object clicked on by the user and extracts 

object attributes to be used to search against. 

In order to process an image into a hyperlinked form, the significant 

nodes within an image must be identified. The nodes must be significant in terms 
15 of human vision rather than conventional machine vision. The aim is not to 

recognise an image, but rather to generate nodes that are likely to be significant 

to a human viewer so that when a viewer clicks a part of the image a sensible 

hyperlink occurs. 

The most important perceptual cues in images are luminance boundaries. 

20 Special cells exist in the brain to detect luminance boundaries and to perform 
complex processing in the presence of occluded contours. One reason for their 
importance in perception is that they are used to define regions in the perceived 
image, the properties of which are often extrapolated from luminance edges via 
a filling-in mechanism. Hence, the basic building blocks, or sub-signs, in an 

25 image are homogenous image regions that are demarcated by either observed 
or implied luminance edges. Signs are groups of these regions having specific 
properties in relation to each other. Semantic objects in an image are specific 
groupings of these signs. 

Referring to FIG 3, it can be seen that the method of identifying nodes 

30 consists of three distinct phases. The first phase involves image segmentation 
by isolating elementary components from the data that are not in themselves 
significant. The data may be any suitable image data including captured video 
data. The second phase requires local grouping of the components into 
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significant entities (signs). These signs are not necessarily meaningful of 
themselves and must undergo a third global grouping phase to produce objects 
with the capacity to be truly meaningful in a semantic sense. At no stage in the 
process is any meaning or interpretation attached to any image component. 

Further description will be made with reference to various images. The 
original images are shown in FIG 4. Later figures display one or more of these 
images after various stages of processing. 

A region growing technique in the HVC colour space is preferred for the 
first phase. Other techniques that could be used include quad-tree split and 
merge, histogram splitting, and watershed algorithms. These techniques will be 
known to persons skilled in the art. 

In the preferred embodiment, the region growing starts by selecting a 
pixel to which neighbouring pixels are joined until no more pixels can be added 
according to some homogeneity criterion. A suitable homogeneity criteria is that 
the intensity difference between the pixel and the closest region boundary pixel 
is below some threshold. The growing procedure is started from the top left 
corner of the image and the growing process is performed continuously until no 
more pixels can be added. When this occurs, a new region is then created by 
starting on the closest unassigned pixel. This continues until all pixels in the 
whole image have been assigned to a sub-sign. 

The HVC colour space was used instead of luminance images alone 
because images with low luminance contrast are difficult to segment correctly 
unless chrominance or saturation variations are exploited. The RGB colour 
space provides no advantages over the luminance alone since the RGB values 
are highly correlated and don't truly reflect the magnitude of perceived colour 
variations. The HVC colour space is well recognised for its successful 
correspondence to human colour perception. It represents colour in terms of hue 
(H), value or intensity (V) and chroma or saturation (C) that describes the colour 
purity. FIG 5 shows the effect of segmenting each of these components 
separately for the fourth image of FIG 4. FIGs 5a-f show respectively hue 
component, value component, chroma component, hue segmentation, value 
segmentation and chroma segmentation. 
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The combined HVC colour segmentation method based on region 
growing uses a cylindrical colour distance measure in place of a simple intensity 
difference. The colour distance between two pixels / and j (which may be 
normalised appropriately) is given by the following (where v h c h h h and v jt c p h Jt 
5 are the value, chroma and hue at pixel locations / and j): 

d a = V( v ' = V J + ( c ' si " (>>,)- Cj sin(/? y )) 2 + (c, cos (h f )- c, cos (hj )) 2 

In one embodiment, the threshold is set automatically using a linear 
combination of the mean and standard deviation of the intensity gradient values 
in the source image according to the following equation: 
10 t = k 0 + k x /i + No- 

where \x is the mean, a is the standard deviation, and the constants ko, 
and k 2 are obtained experimentally to be 3.6, 0.36 and -0.143 respectively, 
using least square optimisation from the sample set of images shown in FIG 
4. 

15 The second phase groups the sub-signs according to local criteria and 

exploits the Gestalt grouping principles of similarity, proximity, good 
continuation and closure. These principles are applied by analysing the 
nature of a number of characteristics of neighbouring sub-signs and the 
boundaries between them. 

20 The second phase can be considered in two stages. The first grouping 

stage is a region labelling and connected component analysis process that 
simultaneously eliminates segmentation artifacts and performs additional 
texture-based grouping. For example, areas that are smaller than 20 pixels 
are considered to be insignificant and are merged into surrounding areas. 

25 Adjacent areas that are also less than 20 pixels are grouped into a textured 
region. The selection of 20 pixels is entirely arbitrary. 

In the second grouping stage, line information from edge detection is 
combined with the region information through Gestalt based grouping rules. 
The process is depicted in FIG 6. Firstly, regions are checked to see if they 

30 are separated by any edges. Regions that are separated by lines usually 
belong to different objects, hence they can not be grouped together. However 
regions that share a continuous line between them along a common side are 
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candidates for grouping together. Regions in this formation are generally 
understood to be part of the same object. The final outcome is a robust set of 
signs consisting of homogeneous regions. The images after the first stage of 
segmentation are shown in FIG 7a and after the second stage in FIG 7b. FIG 
5 8a shows the effect on the original images of FIG 4 of the first phase 

segmentation process. After second phase grouping, the images appear as in 
FIG 8b. 

The third phase globally applies the remaining Gestalt grouping 
principles of common fate, relative size, symmetry and surroundedness, as 

1 0 well as good continuity, to the signs produced by the second phase. For 
example, regions that are completely surrounded by another region are often 
part of the surrounding object and may be grouped together. Regions that are 
symmetrical about a given axis can be considered to be components of a 
single object. The outcome of this stage is an image structure tree that 

1 5 describes the image in terms of its component objects made up of signs that 
are groups of sub-signs. The image structure tree is the data for the node 
metadata file. 

The objects are composed of one or more signs that are represented 
by each node. The child nodes store the sub-signs while the relationships 

20 between the child nodes is stored in the parent node. Each entry in the tree is 
labelled with characteristic features of the image object it identifies, such as 
average colour, colour variance, shape chain code, size and its relationship 
to other significant objects in the image. The result being that the original 
image can be reconstructed with a high degree of accuracy from the 

25 metadata file. The structure tree represents a generalised scene description 
of the original image. 

It will be appreciated that the structure tree permits the image to be 
analysed at various levels of information in terms of component objects. The 
objects may not be strongly semantically significant in themselves, they may 

30 just be a significant portion of a semantic object. For instance, a face can be 
identified as three smaller regions embedded in a larger oval shaped region 
with particular size and distance relationships between them. 
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Each of the objects defined in the image structure tree can now be 
linked with either other objects in the same tree or with objects in other 
images if these have been similarly processed. The associative links are 
predominantly formed by finding similar objects in other images with matching 
5 properties. The properties considered include average object colour and 
texture, object shape, relationships to other objects and object structural 
composition. Various weightings are applied to these properties in the 
matching process to produce different linking outcomes. Normally, shape is 
assigned the highest weighting since it plays a more significant role in 

1 0 cognitive similarity matching. 

The shape of each object is defined using Fourier shape descriptors 
that are invariant to orientation, scale and translation. The process of 
obtaining the Fourier shape descriptor for an object involves first locating the 
centroid of the object, then mapping its shape using polar coordinates and 

1 5 finally applying a Fourier transform to the data. By way of example, a discrete 
cosine transform may be used in place of the Fourier transform due to its 
superior data compaction properties and because it is a purely real transform. 
Assuming each object is defined by a binary mask, P(x, y) of M columns and 
N rows in size, the centroid C xy of an object is found by finding the centroid in 

20 each of the x and y directions separately. 
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After the centroid is located then the distance of the shape boundary 
from the centroid is mapped out using polar coordinates at discrete angular 
intervals k to obtain n total shape samples. The values are placed in an array 
25 b(k) and a one dimensional discrete cosine transform (DCT) is applied to it. 
The resulting transform coefficients form the final shape vector. The DCT is 
defined as: 

Biy) = -2, M*)cos — 
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The average colour for each region is stored in the HVC colour 
components, as these allow associations that are more cognitively correct 
based on the perceptual properties of hue, value and chroma. The definition 
of the HVC colour space is given in terms of the CIE Lab space that is given 
5 in terms of the XYZ space as follows: 
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The similarity matching process is performed by selecting an initial 
image object and performing a weighted mean square error analysis between 

15 its attributes and the corresponding attributes of all other image objects 

stored in the system. The object that generates the lowest mean square error 
against the current object is selected for linking. To avoid circular references 
a list of links that refer to the current object is maintained and checked 
against. If the matched object already has a link that refers to the current 

20 object then that link is rejected and the link is instead formed to the object 
with the second lowest means square error subject to the same condition. If 
no matches are found to be below a certain acceptance threshold then the 
object in question remains unlinked. The weighted mean square error 
criterion is defined for the original data set x and the comparison set k 

25 defined at each location j over a total of n values as follows: 
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In one embodiment the system performs this matching using only low- 
level statistically defined features. This works reasonably well for objects that 
are simple and fairly homogenous. In the case of complex object formed out 
5 of reasonably dissimilar regions this approach is less effective. In these 
situations when an object is not particularly homogenous the saliency of its 
statistical attributes are significantly reduced. In their stead however these 
compound objects tend to exhibit structural saliency. Hence the object 
structure can be used for similarity matching. In this case the matching 

1 0 involves finding objects that are defined similarly in terms of the relationships 
between their component regions. The similarity matching no longer involves 
comparing the values of individual absolute features but rather the relative 
values of a number of features. This is done by directly comparing the object 
structure trees of the object that contain these relationships. At this level the 

1 5 associations supported by the system are no longer purely feature analytic 
but are well on their way to being semantic, based on the structural properties 
of the image content. 

Unlike automatic hypertext authoring systems where key words may 
appear many hundreds of times creating the need for context and relevance 

20 evaluation in the process of creating links, no two different images will 
contain identical objects or attributes. Hence it is extremely unlikely that a 
search will reveal two possible matches that have exactly the same degree of 
similarity with the reference object. The closest match on each occasion can 
therefore be used. 

25 In contrast to existing methods for defining referenceable components 

in images for linking, such as manually defining image maps and using object 
recognition, that are sematic in nature, the system does not require the 
interpretation or recognition of semantics in images and hence it is 
completely generic and unconstrained. The node demarcation heavily 

30 exploits Gestalt principles and makes use of an initial segmentation phase 
followed by local and global grouping phases to identify semiotic sub-signs 
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and signs. The associative linking can operate in either static or dynamic 
mode and is based on object similarity matching using shape, colour and 
relational or structural features to form associations among objects in the 
image sets. The navigation support system is capable of automatically 
5 providing a list of multiple alternative targets that the user may wish to 
pursue. 

Throughout the specification the aim has been to describe the 
invention without limiting the invention to any specific combination of 
features. 
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1. A method for generating hyperlinks between images in a set of images 
including the steps of: 

5 processing each image to identify nodes within the image; 

characterising each node in terms of a discrete set of attributes; 
compiling a searchable metadata file of all sets of attributes for all images; 
searching the metadata file to identify a node having attributes most closely 
related to attributes of a selected node; and 
10 forming a link between the identified node and the selected node. 

2. The method of claim 1 wherein the step of identifying nodes further 
includes the steps of: 

isolating elementary components in an image; 
local grouping of the components into signs; and 
1 5 global grouping of the signs into objects that collectively describe the image. 

3. The method of claim 2 wherein the step of isolating elementary 
components is performed by a region growing technique in which pixels are 
added to a region according to a homogeneity criterion and edge information. 

4. The method of claim 2 wherein the step of local grouping includes the 
20 steps of segmenting the components and edge analysis. 

5. The method of claim 2 wherein Gestalt grouping principles are 
employed in the local grouping step. 

6. The method of claim 2 wherein the local grouping step is performed in 
two stages including a first region labelling and connected component 

25 analysis stage; and a second Gestalt based grouping stage. 

7. The method of claim 2 wherein the global grouping step applies 
Gestalt grouping principals including one or more of: common fate; relative 
size; symmetry, surroundedness and continuity. 

8. The method of claim 1 wherein said searchable metadata file consists 
30 of an image structure tree for each image in said set of images. 

9. The method of claim 1 wherein the step of searching the metadata file 
is a similarity matching process. 
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10. The method of claim 9 wherein the similarity matching process is 
weighted in favour of features of shape. 

1 1 . The method of claim 1 wherein the step of forming links between 
identified nodes and selected nodes occurs dynamically. 

5 12. The method of claim 1 wherein the step of forming links between 
identified nodes and selected nodes occurs statically. 
13. A method of forming an image structure tree for an image in an image 
hyperlinking system that generates hyperlinks between images in a set of 
images, including the steps of: 
1 0 segmenting said image by isolating elementary components in said image; 
local grouping of the components into signs; 
global grouping the signs into objects; and 

arranging the objects, signs and components into an image structure tree to 
describe the image. 
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